Semi-supervised learning

In computer science, semi-supervised learning is a class of machine learning techniques that make use of both labeled and unlabeled data for training - typically a small amount of labeled data with a large amount of unlabeled data. Semi-supervised learning falls between unsupervised learning (without any labeled training data) and supervised learning (with completely labeled training data). Many machine-learning researchers have found that unlabeled data, when used in conjunction with a small amount of labeled data, can produce considerable improvement in learning accuracy. The acquisition of labeled data for a learning problem often requires a skilled human agent to manually classify training examples. The cost associated with the labeling process thus may render a fully labeled training set infeasible, whereas acquisition of unlabeled data is relatively inexpensive. In such situations, semi-supervised learning can be of great practical value.

One example of a semi-supervised learning technique is co-training, in which two or possibly more learners are each trained on a set of examples, but with each learner using a different, and ideally independent, set of features for each example.

An alternative approach is to model the joint probability distribution of the features and the labels. For the unlabelled data the labels can then be treated as 'missing data'. Techniques that handle missing data, such as Gibbs sampling or the EM algorithm, can then be used to estimate the parameters of the model.

See also

References

  1. Abney, S., Semisupervised Learning for Computational Linguistics. Chapman & Hall/CRC, 2008.
  2. Blum, A., Mitchell, T. Combining labeled and unlabeled data with co-training. COLT: Proceedings of the Workshop on Computational Learning Theory, Morgan Kaufmann, 1998, p. 92-100.
  3. Chapelle, O., B. Schölkopf and A. Zien: Semi-Supervised Learning. MIT Press, Cambridge, MA (2006). Further information.
  4. Huang T-M., Kecman V., Kopriva I. [1], Kernel Based Algorithms for Mining Huge Data Sets, Supervised, Semisupervised and Unsupervised Learning, Springer-Verlag, Berlin, Heidelberg, 260 pp. 96 illus., Hardcover, ISBN 3-540-31681-7, 2006.
  5. O'Neill, T. J. (1978) "Normal discrimination with unclassified observations". Journal of the American Statistical Association, 73, 821–826.
  6. Theodoridis S., Koutroumbas K. (2009) Pattern Recognition, 4th Edition, Academic Press, ISBN: 978-1-59749-272-0.
  7. Zhu, X. Semi-supervised learning literature survey.
  8. Zhu, X., Goldberg, A. (2009) Introduction to Semi-Supervised Learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, 3, 1-130. Morgan & Claypool Publishers, 2009.
  9. Song, E. et al. [2], Semi-supervised multi-class Adaboost by exploiting unlabeled data, Expert Systems with Applications, Vol. 38, Issue 6, p. 6720-6726, June 2011.